Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ChildWorker module #8527

Merged
merged 4 commits into from
Jun 28, 2024

Conversation

novicecpp
Copy link
Contributor

@novicecpp novicecpp commented Jun 26, 2024

Fix #8428

New ChildWorker module to spawn the child process from slave to run work() (for example, handleNewTask in Handler.py#L153).

The module can handle timeout (via SIGALARM), coredump, normal error. Then, propagate errors in form of exception back to slave, and to the caller properly.

This mode(?) is behind the feature flags config.FeatureFlags.childWorker, can enable it from TW config file.
The TW config will have additional lines:

config.section_("FeatureFlags")
config.FeatureFlags.childWorker = True
config.FeatureFlags.childWorkerTimeout = 3600 # 1 hours

@novicecpp novicecpp requested a review from belforte June 26, 2024 12:46
@novicecpp novicecpp self-assigned this Jun 26, 2024
@cmsdmwmbot

This comment was marked as outdated.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 41 comments to review
  • Pycodestyle check: succeeded
    • 68 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-CRABServer-PR-test/2032/artifact/artifacts/PullRequestReport.html

@belforte
Copy link
Member

shouldn't we then move to FeatureFlags config section other parameters related to how it works internally ? nslaves, polling, max_retry, retry_interval...

@novicecpp
Copy link
Contributor Author

novicecpp commented Jun 26, 2024

I put it behind feature flags so we can easily turn it off and purge it later if it does not work. I plan to move to the TaskWorker section later if we enable it permanently.

And nslaves, polling, max_retry, retry_interval... variable does not affect how the ChildWorker works except SequentialWorker where it needs to fall back to a single process for debugging with pdb.

I can move it back to the usual TaskWorker section if you do not like it.

@belforte
Copy link
Member

Beautiful !! Thanks.
Yes, we can unify config. parameters later.

Only one comment, just like you pass logger.name around, could there be a simple way to pass also the logger formatter string ?
Add to future /src/utils/TW/twUtils.py ?

@belforte
Copy link
Member

good point (about sequential test)

@cmsdmwmbot

This comment was marked as outdated.

@novicecpp novicecpp added the PR: do not merge This PR is a work in progress and not ready to be merged label Jun 26, 2024
@novicecpp novicecpp removed the PR: do not merge This PR is a work in progress and not ready to be merged label Jun 28, 2024
@novicecpp novicecpp merged commit e3f4a15 into dmwm:master Jun 28, 2024
1 check failed
@novicecpp
Copy link
Contributor Author

Sorry Stefano, forgot to answer your question.

Only one comment, just like you pass logger.name around, could there be a simple way to pass also the logger formatter string ?
Add to future /src/utils/TW/twUtils.py ?

I do not know the good way, but now I do not need it anymore (see the comments in module doc).

@belforte
Copy link
Member

thanks for the explaining the logger situation !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

in TW handle each task in its own subprocess
3 participants